Conversation
…ail cursor Decouple the video NACK retransmit ring from NANORTC_OUT_QUEUE_SIZE so IoT targets can shrink the NACK window without starving the output dispatch queue. Default equals NANORTC_OUT_QUEUE_SIZE — fully behavior-preserving. - nanortc_config.h: new macro with ifndef guard, ESP-IDF Kconfig hook, power-of-two + min-4 validation. Caller-drain invariant documented. - nanortc.h: resize pkt_ring[] and pkt_ring_meta[] to the new macro, add pkt_ring_tail field (memset-zeroed by nanortc_init). Write-path and NACK-scan updates land in the follow-up commit. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
… + NACK paths - nano_rtc.c video_send_fragment_cb: derive slot from pkt_ring_tail (mask NANORTC_VIDEO_PKT_RING_SIZE - 1) instead of out_tail, advance the cursor after stamping meta. out_queue slot selection inside rtc_enqueue_transmit is unchanged. - nano_rtc.c NACK retransmit scan: iterate over NANORTC_VIDEO_PKT_RING_SIZE so scan bounds track pkt_ring actual size (not the unrelated output queue). - Kconfig: expose CONFIG_NANORTC_VIDEO_PKT_RING_SIZE, defaulting to the existing OUT_QUEUE_SIZE value for backward-compat. Verified: default build (PKT_RING=OUT_QUEUE=32) and override build (-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32) both build clean and pass 21/21 host tests. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
…IZE knob
Adds three focused unit tests in test_media.c (under #if NANORTC_FEATURE_VIDEO)
that mirror the nano_rtc.c write-path / NACK-scan idioms against a zeroed
nanortc_t:
- in-window lookup: recently-stamped seqs are findable.
- out-of-window miss: after 2× ring wrap, old seqs are gone without crash.
- independence from out_tail: pkt_ring_tail advances alone; out_tail stays
at whatever the caller set; slot 0 holds the newest wrap's seq.
Tests pass on both default build (PKT_RING=OUT_QUEUE=32) and IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32), which exercises the
wraparound path impossible to hit at the default.
Docs: add a "Video NACK ring — tunable independently of the output queue"
section to memory-profiles.md covering the macro, example override,
safety invariant (Sans-I/O caller-drain contract), and expected ~19 KB
saving at PKT_RING_SIZE=16. Tweak the Biggest Contributors and Kconfig
tables to point at the new macro.
Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
Copilot review on PR #54 flagged that the "override to 8 saves ~19 KB" claim in the Kconfig help text is wrong for ESP-IDF defaults — with OUT_QUEUE_SIZE=16 (video) the 16→8 drop at MEDIA_BUF_SIZE=1232 only saves ~9.6 KB. The ~19 KB figure matches the host-default 32→16 case. Rewrite the Kconfig help text around the formula (OUT_QUEUE_SIZE - PKT_RING_SIZE) × MEDIA_BUF_SIZE so it stays accurate across host and ESP-IDF configurations, with a Kconfig-correct 9.6 KB worked example. memory-profiles.md already targets host overrides, but extend its inline comment to spell out both host (32→16 ≈19 KB) and Kconfig (16→8 ≈9.6 KB) numbers so readers in either context pick the right one. Addresses #54 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address PR #54 review concern: when NANORTC_VIDEO_PKT_RING_SIZE is set below the per-frame fragment count, video_send_fragment_cb silently wraps pkt_ring while out_queue[].transmit.data still references the stale slots — the receiver gets newer fragment bytes carrying the older fragment's RTP seq, manifesting as glitched IDRs. Sans-I/O drain doesn't save you because nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller. Add a runtime guard: when (out_tail - out_head) >= PKT_RING_SIZE, the ring would wrap into a slot whose pointer is still pending. Bump nanortc_t.stats_pkt_ring_overrun (atomic, matches existing stats_* fields) and emit a single static-string NANORTC_LOGW so under-sizing shows up in integration smoke tests rather than as garbled keyframes on the wire. Counter is exported on the struct for app glue to inspect. Cannot prevent the corruption from this layer (the slot is already about to be written), but turning silent-bad-video into a debuggable signal matches the LOG-injection-for-AI-debug use case the rest of the library follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror video_send_fragment_cb's write path (pkt_ring slot + meta stamp +
out_queue pointer enqueue) in test_media.c so the test exercises the
actual pointer-aliasing surface, not just the NACK-scan ring semantics.
Two new tests:
- test_pkt_ring_overrun_counter_fires_when_undersized: pushes
PKT_RING_SIZE+3 fragments without draining out_queue, asserts the
counter trips exactly 3 times — guards the contract enforced in
nano_rtc.c.
- test_pkt_ring_aliasing_corruption_demonstrated_without_guard:
documents *why* the guard exists. Without draining, out_queue[0]'s
.data pointer still addresses pkt_ring[0], which has since been
overwritten by the wrapped fragment. Asserts the byte at .data[0]
is the latest writer's marker, not the original. This is the
silent-corruption pattern the runtime guard surfaces.
Both pass under default config (PKT_RING=OUT_QUEUE=32, no overrun
because out_queue fills first) and under the IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=8 -DNANORTC_OUT_QUEUE_SIZE=32, where the
overrun path actually fires).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent bound Replace the "drain each tick" framing in Kconfig + memory-profiles.md with the precise hard constraint: PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1 The previous wording implied the Sans-I/O drain contract was sufficient, but nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller — there is no opportunity to drain mid-frame. Sizing PKT_RING_SIZE below the per-frame fragment count silently corrupts every IDR (older fragment buffers are overwritten while their pointers are still queued in out_queue). Add a worked-numbers table for 480p/720p/1080p H.264 at MTU 1200, and soften the "500 ms NACK history at 30 fps" claim — that figure only held for single-NAL frames; multi-NAL streams cover much less wall- clock time per slot. Cross-reference the new stats_pkt_ring_overrun counter so integrators can wire it into smoke tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Close out PR-3 in the Phase 8 exec plan. Key corrections vs the original
draft:
- The draft step 3 recommended "PKT_RING=16 at 30 fps = ~500 ms NACK
window" as the IoT sizing rule. That claim only held for single-NAL
frames — on 720p+ streams with multi-NAL IDRs it silently corrupted
every keyframe because the ring would wrap while out_queue[].data
pointers still referenced the overwritten slots. Replaced with the
real hard constraint:
PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1
plus a worked-numbers table (480p ≈ 16, 720p ≈ 32, 1080p ≈ 64),
landing in memory-profiles.md.
- Added a runtime overrun guard + stats_pkt_ring_overrun counter so
under-sizing surfaces in integration smoke tests rather than as
glitched IDRs on the wire.
- Grew the test count from the planned single test_nack_ring to five
tests in test_media.c, including an aliasing-corruption witness
that fails without the guard.
Also update docs/PLANS.md line 16 from "Planning" to track PR-3 and
PR-5 as landed so the phase's next-action surface is accurate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decouple video NACK ring from output queue size
Resolve nano_rtc.c conflict by keeping the explanatory comments around the pkt_ring overrun guard. The code logic is identical to origin/main (both branches independently landed the same overrun-detection logic); the comments document the aliasing-corruption rationale and NACK seq invariant that motivate the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on PR #57: - nanortc_config.h: replace the misleading "drain each tick" framing on NANORTC_VIDEO_PKT_RING_SIZE with the actual hard sizing rule (ceil(max_frame_bytes / VIDEO_MTU) + 1). nanortc_send_video() emits every FU-A fragment of an access unit before returning, so callers cannot drain mid-frame; sizing must be based on worst-case fragments-per-frame, and savings depend on OUT_QUEUE_SIZE rather than a fixed ~19 KB number. - Kconfig: rephrase "Set the runtime overrun counter" to "Monitor stats_pkt_ring_overrun" — the counter is incremented internally and is only readable by integrators, not user-settable. - tests/test_media.c: rename test_pkt_ring_aliasing_corruption_demonstrated_without_guard to test_pkt_ring_aliasing_corrupts_pending_pointers_when_undersized. The previous name implied the helper bypassed the guard, but the test uses pkt_ring_simulate_send_with_guard() and the guard does fire — what the test actually demonstrates is that the runtime guard *cannot prevent* corruption once PKT_RING_SIZE is undersized; it only surfaces the problem via stats_pkt_ring_overrun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pkt_ring overrun-guard + slot-allocation pattern, plus the matching NACK metadata stamp and pkt_ring_tail advance, were duplicated between the H.264 zero-copy loop in rtc_send_video (added by PR #56 / fbb5b4f) and the H.265 callback video_send_fragment_cb (added by PR #54 / c1cb699). The verbose comment block explaining the aliasing-corruption invariant lived only at the H.265 site, leaving the H.264 site under-documented. Extract: static uint8_t *pkt_ring_alloc_slot(nanortc_t *, uint16_t *out_pslot) — runs the (out_tail - out_head) >= PKT_RING_SIZE check, bumps stats_pkt_ring_overrun + emits NANORTC_LOGW on overrun, returns rtc->pkt_ring[pslot]. static void pkt_ring_commit_slot(nanortc_t *, uint16_t pslot, uint16_t wire_seq, uint16_t srtp_len) — stamps pkt_ring_meta and increments pkt_ring_tail. Both helpers live at the top of the #if NANORTC_FEATURE_VIDEO block, ahead of the H.265 nesting, so both writers see the same definition. The aliasing-corruption rationale now lives once at pkt_ring_alloc_slot where both callers can find it. Behavior identical: same guard logic, same overrun counter increment, same slot index derivation, same meta layout. Verified across the default + ASan + ICE_SRFLX-off + H.265 feature combos (21 / 21 host tests, 23 / 23 with H.265 enabled, 15 / 15 ci-check.sh --fast). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/PLANS.md line 16: Phase 8 status now lists PR-4 as landed 2026-04-25 (PR #56 / fbb5b4f), matches PR-3 / PR-5 entries. - docs/exec-plans/active/phase8-continued-optimization.md: - Tag PR-4 as [COMPLETED 2026-04-25] in both the summary table and the section header. - Replace the speculative "Approach" bullets with the landed approach (Option C — fragment iterator), measured savings, test additions, and verification record. - PR-3 step 5: note the runtime overrun guard was helper-extracted into pkt_ring_alloc_slot in PR #57 and is now shared between the H.264 and H.265 writers. - PR-3 step 6: rename aliasing_corruption_demonstrated_without_guard to aliasing_corrupts_pending_pointers_when_undersized to match the test rename in ba32abf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nsctp_handle_timeout() transitions SCTP to CLOSED after exceeding
NANORTC_SCTP_MAX_RETRANSMITS but never notified the upper layer —
applications only learned about the loss on the next
nanortc_datachannel_send() failure, a silent stall from the user's
point of view. Phase 8 PR-2.
Wire the signal through:
- nano_sctp.h: new bool closed_due_to_failure on nano_sctp_t, set
exactly once when retransmit exhaustion trips the CLOSED transition.
Init-time CLOSED (line 480) and peer-ABORT-induced CLOSED (line 982)
deliberately do NOT set this — they have separate signaling channels;
rationale recorded inline.
- nano_sctp.c: set the flag alongside the existing state=CLOSED in the
retransmit-exhaustion branch.
- nano_rtc.c: in rtc_process_timers, after the SCTP timer block, read
and clear the flag. If the RTC was previously NANORTC_STATE_CONNECTED,
emit NANORTC_EV_DISCONNECTED and transition to NANORTC_STATE_CLOSED.
Mirrors the existing ICE-failure / consent-expiry paths. Critical
detail: emit block lives outside the if (sctp.state == ESTABLISHED)
guard — once the timeout flips state to CLOSED, that guard goes false
on the next tick and the signal would otherwise be lost.
- tests/test_sctp.c:
* test_sctp_timeout_sets_closed_flag (under #if NANORTC_FEATURE_DC_RELIABLE)
walks the full RTO ramp (1000 → 2000 → 4000 → 8000 → 10000 capped)
through MAX_RETRANSMITS cycles of nsctp_handle_timeout + nsctp_poll_output,
asserts CLOSED + flag on the threshold-trip iteration.
* test_sctp_abort_does_not_set_failure_flag — scope guard ensuring
peer ABORT does not regress into the failure flag.
Verified across DC_RELIABLE=ON (test_sctp 56/56), DC_RELIABLE=OFF
(test_sctp 55/55, Test 1 correctly skipped), and scripts/ci-check.sh
--fast (15/15 incl. ASan + ICE_SRFLX=OFF + format).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/PLANS.md line 16: Phase 8 status now lists PR-2 (SCTP failure
event propagation) as landed 2026-04-25, alongside the existing
PR-3 / PR-4 / PR-5 entries; only PR-1 (IoT memory profile) remains
pending in the recommended-order set.
- docs/exec-plans/active/phase8-continued-optimization.md:
- Tag PR-2 as [COMPLETED 2026-04-25] in the section header.
- Replace the speculative "Approach" bullets with the landed
implementation: closed_due_to_failure flag, RTC emit block placed
outside the ESTABLISHED guard (the non-obvious detail), the two
new regression tests, and verification record (test_sctp 56/56
with DC_RELIABLE=ON, 55/55 with DC_RELIABLE=OFF, ci-check.sh
--fast 15/15).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both harnesses are already registered in tests/fuzz/CMakeLists.txt (fuzz_turn in FUZZ_CORE_TARGETS, fuzz_h265 under FUZZ_H265_TARGETS, gated by NANORTC_FEATURE_H265). The CI fuzz job already configures H265=ON, so adding the two harnesses to the run loop is the only remaining step. Closes review §4 P1-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spins up the existing tests/interop/turn-server/docker-compose.yml coturn (via scripts/start-test-turn.sh) and runs ctest -L turn-relay against test_interop_turn_relay_nanortc.c (5 cases). Phase 5.2 F6-F10 regressions are now caught in CI; previously they only had hand verification on a downstream camera SDK. Updates docs/engineering/turn-rfc-compliance.md to reflect the new coverage status (the 'no automated test' note was stale once the 744-LOC harness landed). Closes review §4 P1-2 + P1-5 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rtc_begin_dtls_handshake() previously set rtc->state to DTLS_HANDSHAKING before calling dtls_start(). On dtls_start failure the function returned early but state was already advanced — leaving a half-initialised DTLS context that subsequent handle_input() bytes would drive down the DTLS path. Reorder so state is only advanced after dtls_start() succeeds; failure leaves state at ICE_CONNECTED so the caller can retry cleanly. Failure-path mock test deferred until a mock crypto provider lands (no current way to make dtls_init / dtls_start fail in tests). Closes review §4 P1-4 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without explicit dtls_destroy() in nanortc_ice_restart(), the "if (!rtc->dtls.crypto_ctx)" guard in rtc_begin_dtls_handshake would skip dtls_init on the next handshake and reuse the previous DTLS context — possible key-material reuse / orphan handshake timers. dtls_destroy is a no-op when crypto_ctx is NULL, so calling it unconditionally is safe even before any DTLS init. Adds two regression tests: - test_api_ice_restart_clears_dtls: dtls_init -> ice_restart asserts crypto_ctx is NULL and dtls.state is CLOSED post-restart. - test_api_ice_restart_no_dtls_safe: ice_restart without prior dtls_init does not crash. Closes review §4 P2-3 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CRC modules (nano_crc32.c / nano_crc32c.c) carried 100% line coverage indirectly via fuzz_stun (FINGERPRINT verify) and fuzz_sctp (chunk checksum), but had no direct algorithm-vector test. An accidental polynomial / initial-value / final-XOR change could pass indirect fuzz round-trip while still being algorithmically wrong. Adds 13 cases pinning canonical RFC vectors: - CRC-32 ISO HDLC (RFC 1952): empty, "123456789" (0xCBF43926), "a" (0xE8B7BE43), "abc" (0x352441C2) - CRC-32c Castagnoli (RFC 3309 / RFC 3720 §B.4): empty, "123456789" (0xE3069283), 32x0x00 (0x8A9136AA), 32x0xFF (0x62A8AB43), incrementing/decrementing 32-byte buffers - Incremental API: segmented == one-shot (the 3-segment pattern used by nsctp_verify_checksum), zero-length update tolerance, byte-at-a-time fragmentation Closes review §4 P3-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_crc32.c calls nano_crc32c{,_init,_update,_final}, which live in
src/nano_crc32c.c and are only compiled when NANORTC_FEATURE_DATACHANNEL
is ON. Without gating, AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY builds and
the architecture-constraints DC-OFF check fail to link.
Add test_crc32 to the DC_TESTS list so the existing continue() guard
skips it in DC-OFF configs, mirroring test_datachannel. The 13 CRC
assertions still run in DATA / AUDIO / MEDIA / MEDIA_H265 — RFC vectors
are configuration-invariant, so no coverage is lost.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds esp32_datachannel across esp32 / esp32s3 / esp32p4 and esp32_audio + esp32_video on esp32s3 via espressif/esp-idf-ci-action on push/PR to main. Catches component / Kconfig / lwIP regressions without requiring hardware; on-device Python E2E remains manual. esp32_camera is intentionally excluded — its board_manager autogen + camera-sensor selection is board-specific and will land separately.
There was a problem hiding this comment.
Pull request overview
This PR hardens the DTLS/ICE-restart lifecycle in nanortc and expands validation/CI coverage (CRC vectors, fuzz harnesses, and TURN-relay interop) to better catch regressions in core networking and crypto-adjacent paths.
Changes:
- DTLS lifecycle: destroy DTLS context on
nanortc_ice_restart(); only advance RTC state toDTLS_HANDSHAKINGafterdtls_start()succeeds. - Tests: add direct CRC-32/CRC-32c RFC-vector unit tests; add ICE-restart tests asserting DTLS teardown behavior.
- CI/docs: extend fuzz job to include
fuzz_h265andfuzz_turn; add a dedicated coturn-backed interop job and update TURN compliance documentation accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/nano_rtc.c |
Adjusts DTLS handshake state progression and destroys DTLS state during ICE restart. |
tests/test_trickle_ice.c |
Adds regression tests around ICE restart clearing DTLS state and safety when DTLS is uninitialized. |
tests/test_crc32.c |
Introduces standalone CRC-32/CRC-32c RFC-vector tests plus incremental API checks. |
tests/CMakeLists.txt |
Adds test_crc32 under the DataChannel-gated test set. |
docs/engineering/turn-rfc-compliance.md |
Updates TURN relay coverage narrative to reflect new nanortc-as-client relay tests and CI job. |
.github/workflows/ci.yml |
Expands fuzz harness set and adds a new interop-turn-relay job running coturn-backed labeled tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The custom partition table reserves a 0x3a0000 (~3.6 MB) factory app, ending at 0x3b0000. ESP-IDF's default flash size is 2 MB, so a clean build from sdkconfig.defaults fails partition-table generation with: Partitions tables occupies 3.7MB of flash (3866624 bytes) which does not fit in configured flash size 2MB. The locally-tracked sdkconfig was set to 16 MB, which masked this in manual builds; the new ESP-IDF CI workflow starts from defaults and caught the regression. Set CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y as the minimum size that fits the partitions; boards with larger flash override via menuconfig. Verified: gen_esp32part.py with --flash-size 4MB succeeds; with 2MB reproduces the CI error exactly.
esp32_video's main/CMakeLists.txt packs frames from examples/sample_data/h264SampleFrames and opusSampleFrames into video.blob / audio.blob at build time. examples/sample_data is a git submodule (0x1abin/media_sample_data); without `submodules: recursive`, actions/checkout@v4 leaves the path empty and the build fails: Error: examples/sample_data/h264SampleFrames is not a directory Error: examples/sample_data/opusSampleFrames is not a directory The other matrix targets don't reference sample_data, but enabling submodules at the workflow level keeps the step uniform; the submodule is ~15 MB so the cost is negligible.
dtls_destroy() drops the DTLS context but leaves rtc->sdp.local_fingerprint and rtc->dtls.local_fingerprint populated. rtc_cache_fingerprint() is a write-once cache (early-returns when sdp.local_fingerprint is non-empty), so the next create_offer / accept_offer after an ICE restart would re-advertise the *previous* cert's SDP fingerprint while a fresh dtls_init() generates a new cert and fingerprint. The peer then rejects DTLS handshake with a fingerprint mismatch. Memset both fingerprints to '\0' immediately after dtls_destroy() so rtc_cache_fingerprint() repopulates them from the next dtls_init's cert. Test: extend test_api_ice_restart_clears_dtls (T20) — pre-populate both fingerprint buffers before calling nanortc_ice_restart and assert they are cleared on return. Reported by Copilot review on PR #59.
The previous comment claimed that keeping state at ICE_CONNECTED after a dtls_start() failure prevents "a half-initialised DTLS context being driven by inbound bytes". That is misleading: the inbound DTLS gate in handle_input is `state >= NANORTC_STATE_ICE_CONNECTED`, so DTLS records are still accepted on the retry path. The state ordering only governs the FSM; the actual receive-path policy is unchanged. Reword to describe what the reorder actually achieves (FSM stays at ICE_CONNECTED so callers can retry DTLS startup) and explicitly call out that the inbound gate is unchanged, so future readers don't infer a stronger guarantee than exists. Reported by Copilot review on PR #59.
# Conflicts: # src/nano_rtc.c # tests/test_trickle_ice.c
Brackets the unit-level T20/T21 coverage in tests/test_trickle_ice.c with a full ICE -> DTLS round trip on both sides. Existing unit tests probe nanortc_ice_restart() against a hand-built nanortc_t and assert which fields got reset; nothing today drives the end-to-end path of two peers reaching DTLS_CONNECTED, restarting, and re-handshaking with fresh keying material. - e2e_resync_ice_creds() helper: counterpart to e2e_setup_ice_creds() that re-syncs ufrag/pwd and re-arms remote_candidates after both peers have called nanortc_ice_restart(). ice_restart() preserves local_candidates (RFC 8445 §9.1.1.1) but blanks the peer side. - test_e2e_ice_restart_dtls_rehandshake: drives both peers to DTLS_CONNECTED, captures fingerprints, restarts, re-syncs, pumps to reconnect, asserts the new cert hash differs from pre-restart. Catches a regression of 8fea1e8 (without dtls_destroy in nanortc_ice_restart, the second handshake skips dtls_init via the !crypto_ctx guard and reuses the original cert). - test_e2e_ice_restart_sdp_fingerprint_refresh: drives create_offer twice across an ice_restart, asserts sdp.local_fingerprint re-syncs to the freshly generated dtls.local_fingerprint. Catches a regression of 9a75412 (without the fingerprint memset, rtc_cache_fingerprint early-returns on the stale cache and the SDP cache holds the old cert hash while DTLS holds the new one). Validation: each new test was confirmed to fail when its target commit is surgically reverted in the working tree, and to pass on baseline develop. ./scripts/ci-check.sh --fast 15/15 green; full host suite 22/22. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-PR-5 follow-up commits surfaced by the Copilot review on PR #59 (d59a528 / 8fea1e8 / 9a75412 / af8f566) were not tracked anywhere — TD-021 closed the original ICE hardening pass and phase8 PR-5 was already marked completed. Add a consolidated TD-022 entry naming each commit, the bug it fixes, and the test that pins it (the unit T20/T21 sites plus the two new e2e tests). Add a "Post-merge hardening" subsection at the tail of phase8 PR-5 with a commit -> failing-test mapping table so future readers can connect the breadcrumb trail without crawling git log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0x1abin
added a commit
that referenced
this pull request
May 7, 2026
…fs (#62) * feat(video): introduce NANORTC_VIDEO_PKT_RING_SIZE macro + pkt_ring_tail cursor Decouple the video NACK retransmit ring from NANORTC_OUT_QUEUE_SIZE so IoT targets can shrink the NACK window without starving the output dispatch queue. Default equals NANORTC_OUT_QUEUE_SIZE — fully behavior-preserving. - nanortc_config.h: new macro with ifndef guard, ESP-IDF Kconfig hook, power-of-two + min-4 validation. Caller-drain invariant documented. - nanortc.h: resize pkt_ring[] and pkt_ring_meta[] to the new macro, add pkt_ring_tail field (memset-zeroed by nanortc_init). Write-path and NACK-scan updates land in the follow-up commit. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * feat(video): wire pkt_ring_tail cursor + honor PKT_RING_SIZE in write + NACK paths - nano_rtc.c video_send_fragment_cb: derive slot from pkt_ring_tail (mask NANORTC_VIDEO_PKT_RING_SIZE - 1) instead of out_tail, advance the cursor after stamping meta. out_queue slot selection inside rtc_enqueue_transmit is unchanged. - nano_rtc.c NACK retransmit scan: iterate over NANORTC_VIDEO_PKT_RING_SIZE so scan bounds track pkt_ring actual size (not the unrelated output queue). - Kconfig: expose CONFIG_NANORTC_VIDEO_PKT_RING_SIZE, defaulting to the existing OUT_QUEUE_SIZE value for backward-compat. Verified: default build (PKT_RING=OUT_QUEUE=32) and override build (-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32) both build clean and pass 21/21 host tests. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * test(video): regression for pkt_ring decoupling + document PKT_RING_SIZE knob Adds three focused unit tests in test_media.c (under #if NANORTC_FEATURE_VIDEO) that mirror the nano_rtc.c write-path / NACK-scan idioms against a zeroed nanortc_t: - in-window lookup: recently-stamped seqs are findable. - out-of-window miss: after 2× ring wrap, old seqs are gone without crash. - independence from out_tail: pkt_ring_tail advances alone; out_tail stays at whatever the caller set; slot 0 holds the newest wrap's seq. Tests pass on both default build (PKT_RING=OUT_QUEUE=32) and IoT override (-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32), which exercises the wraparound path impossible to hit at the default. Docs: add a "Video NACK ring — tunable independently of the output queue" section to memory-profiles.md covering the macro, example override, safety invariant (Sans-I/O caller-drain contract), and expected ~19 KB saving at PKT_RING_SIZE=16. Tweak the Biggest Contributors and Kconfig tables to point at the new macro. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * docs(video): fix inaccurate ~19 KB claim in PKT_RING_SIZE guidance Copilot review on PR #54 flagged that the "override to 8 saves ~19 KB" claim in the Kconfig help text is wrong for ESP-IDF defaults — with OUT_QUEUE_SIZE=16 (video) the 16→8 drop at MEDIA_BUF_SIZE=1232 only saves ~9.6 KB. The ~19 KB figure matches the host-default 32→16 case. Rewrite the Kconfig help text around the formula (OUT_QUEUE_SIZE - PKT_RING_SIZE) × MEDIA_BUF_SIZE so it stays accurate across host and ESP-IDF configurations, with a Kconfig-correct 9.6 KB worked example. memory-profiles.md already targets host overrides, but extend its inline comment to spell out both host (32→16 ≈19 KB) and Kconfig (16→8 ≈9.6 KB) numbers so readers in either context pick the right one. Addresses #54 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(video): detect pkt_ring overrun + bump stats counter Address PR #54 review concern: when NANORTC_VIDEO_PKT_RING_SIZE is set below the per-frame fragment count, video_send_fragment_cb silently wraps pkt_ring while out_queue[].transmit.data still references the stale slots — the receiver gets newer fragment bytes carrying the older fragment's RTP seq, manifesting as glitched IDRs. Sans-I/O drain doesn't save you because nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller. Add a runtime guard: when (out_tail - out_head) >= PKT_RING_SIZE, the ring would wrap into a slot whose pointer is still pending. Bump nanortc_t.stats_pkt_ring_overrun (atomic, matches existing stats_* fields) and emit a single static-string NANORTC_LOGW so under-sizing shows up in integration smoke tests rather than as garbled keyframes on the wire. Counter is exported on the struct for app glue to inspect. Cannot prevent the corruption from this layer (the slot is already about to be written), but turning silent-bad-video into a debuggable signal matches the LOG-injection-for-AI-debug use case the rest of the library follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(video): aliasing regression for pkt_ring overrun detection Mirror video_send_fragment_cb's write path (pkt_ring slot + meta stamp + out_queue pointer enqueue) in test_media.c so the test exercises the actual pointer-aliasing surface, not just the NACK-scan ring semantics. Two new tests: - test_pkt_ring_overrun_counter_fires_when_undersized: pushes PKT_RING_SIZE+3 fragments without draining out_queue, asserts the counter trips exactly 3 times — guards the contract enforced in nano_rtc.c. - test_pkt_ring_aliasing_corruption_demonstrated_without_guard: documents *why* the guard exists. Without draining, out_queue[0]'s .data pointer still addresses pkt_ring[0], which has since been overwritten by the wrapped fragment. Asserts the byte at .data[0] is the latest writer's marker, not the original. This is the silent-corruption pattern the runtime guard surfaces. Both pass under default config (PKT_RING=OUT_QUEUE=32, no overrun because out_queue fills first) and under the IoT override (-DNANORTC_VIDEO_PKT_RING_SIZE=8 -DNANORTC_OUT_QUEUE_SIZE=32, where the overrun path actually fires). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(video): tighten PKT_RING_SIZE sizing guidance to per-frame fragment bound Replace the "drain each tick" framing in Kconfig + memory-profiles.md with the precise hard constraint: PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1 The previous wording implied the Sans-I/O drain contract was sufficient, but nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller — there is no opportunity to drain mid-frame. Sizing PKT_RING_SIZE below the per-frame fragment count silently corrupts every IDR (older fragment buffers are overwritten while their pointers are still queued in out_queue). Add a worked-numbers table for 480p/720p/1080p H.264 at MTU 1200, and soften the "500 ms NACK history at 30 fps" claim — that figure only held for single-NAL frames; multi-NAL streams cover much less wall- clock time per slot. Cross-reference the new stats_pkt_ring_overrun counter so integrators can wire it into smoke tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-3 completed, record review-driven design change Close out PR-3 in the Phase 8 exec plan. Key corrections vs the original draft: - The draft step 3 recommended "PKT_RING=16 at 30 fps = ~500 ms NACK window" as the IoT sizing rule. That claim only held for single-NAL frames — on 720p+ streams with multi-NAL IDRs it silently corrupted every keyframe because the ring would wrap while out_queue[].data pointers still referenced the overwritten slots. Replaced with the real hard constraint: PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1 plus a worked-numbers table (480p ≈ 16, 720p ≈ 32, 1080p ≈ 64), landing in memory-profiles.md. - Added a runtime overrun guard + stats_pkt_ring_overrun counter so under-sizing surfaces in integration smoke tests rather than as glitched IDRs on the wire. - Grew the test count from the planned single test_nack_ring to five tests in test_media.c, including an aliasing-corruption witness that fails without the guard. Also update docs/PLANS.md line 16 from "Planning" to track PR-3 and PR-5 as landed so the phase's next-action surface is accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(video): clarify pkt_ring sizing rule + overrun counter wording Address Copilot review on PR #57: - nanortc_config.h: replace the misleading "drain each tick" framing on NANORTC_VIDEO_PKT_RING_SIZE with the actual hard sizing rule (ceil(max_frame_bytes / VIDEO_MTU) + 1). nanortc_send_video() emits every FU-A fragment of an access unit before returning, so callers cannot drain mid-frame; sizing must be based on worst-case fragments-per-frame, and savings depend on OUT_QUEUE_SIZE rather than a fixed ~19 KB number. - Kconfig: rephrase "Set the runtime overrun counter" to "Monitor stats_pkt_ring_overrun" — the counter is incremented internally and is only readable by integrators, not user-settable. - tests/test_media.c: rename test_pkt_ring_aliasing_corruption_demonstrated_without_guard to test_pkt_ring_aliasing_corrupts_pending_pointers_when_undersized. The previous name implied the helper bypassed the guard, but the test uses pkt_ring_simulate_send_with_guard() and the guard does fire — what the test actually demonstrates is that the runtime guard *cannot prevent* corruption once PKT_RING_SIZE is undersized; it only surfaces the problem via stats_pkt_ring_overrun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(video): extract pkt_ring_alloc_slot + commit_slot helpers The pkt_ring overrun-guard + slot-allocation pattern, plus the matching NACK metadata stamp and pkt_ring_tail advance, were duplicated between the H.264 zero-copy loop in rtc_send_video (added by PR #56 / fbb5b4f) and the H.265 callback video_send_fragment_cb (added by PR #54 / c1cb699). The verbose comment block explaining the aliasing-corruption invariant lived only at the H.265 site, leaving the H.264 site under-documented. Extract: static uint8_t *pkt_ring_alloc_slot(nanortc_t *, uint16_t *out_pslot) — runs the (out_tail - out_head) >= PKT_RING_SIZE check, bumps stats_pkt_ring_overrun + emits NANORTC_LOGW on overrun, returns rtc->pkt_ring[pslot]. static void pkt_ring_commit_slot(nanortc_t *, uint16_t pslot, uint16_t wire_seq, uint16_t srtp_len) — stamps pkt_ring_meta and increments pkt_ring_tail. Both helpers live at the top of the #if NANORTC_FEATURE_VIDEO block, ahead of the H.265 nesting, so both writers see the same definition. The aliasing-corruption rationale now lives once at pkt_ring_alloc_slot where both callers can find it. Behavior identical: same guard logic, same overrun counter increment, same slot index derivation, same meta layout. Verified across the default + ASan + ICE_SRFLX-off + H.265 feature combos (21 / 21 host tests, 23 / 23 with H.265 enabled, 15 / 15 ci-check.sh --fast). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-4 completed and refresh PR-3 record - docs/PLANS.md line 16: Phase 8 status now lists PR-4 as landed 2026-04-25 (PR #56 / fbb5b4f), matches PR-3 / PR-5 entries. - docs/exec-plans/active/phase8-continued-optimization.md: - Tag PR-4 as [COMPLETED 2026-04-25] in both the summary table and the section header. - Replace the speculative "Approach" bullets with the landed approach (Option C — fragment iterator), measured savings, test additions, and verification record. - PR-3 step 5: note the runtime overrun guard was helper-extracted into pkt_ring_alloc_slot in PR #57 and is now shared between the H.264 and H.265 writers. - PR-3 step 6: rename aliasing_corruption_demonstrated_without_guard to aliasing_corrupts_pending_pointers_when_undersized to match the test rename in ba32abf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sctp): emit DISCONNECTED on retransmit exhaustion nsctp_handle_timeout() transitions SCTP to CLOSED after exceeding NANORTC_SCTP_MAX_RETRANSMITS but never notified the upper layer — applications only learned about the loss on the next nanortc_datachannel_send() failure, a silent stall from the user's point of view. Phase 8 PR-2. Wire the signal through: - nano_sctp.h: new bool closed_due_to_failure on nano_sctp_t, set exactly once when retransmit exhaustion trips the CLOSED transition. Init-time CLOSED (line 480) and peer-ABORT-induced CLOSED (line 982) deliberately do NOT set this — they have separate signaling channels; rationale recorded inline. - nano_sctp.c: set the flag alongside the existing state=CLOSED in the retransmit-exhaustion branch. - nano_rtc.c: in rtc_process_timers, after the SCTP timer block, read and clear the flag. If the RTC was previously NANORTC_STATE_CONNECTED, emit NANORTC_EV_DISCONNECTED and transition to NANORTC_STATE_CLOSED. Mirrors the existing ICE-failure / consent-expiry paths. Critical detail: emit block lives outside the if (sctp.state == ESTABLISHED) guard — once the timeout flips state to CLOSED, that guard goes false on the next tick and the signal would otherwise be lost. - tests/test_sctp.c: * test_sctp_timeout_sets_closed_flag (under #if NANORTC_FEATURE_DC_RELIABLE) walks the full RTO ramp (1000 → 2000 → 4000 → 8000 → 10000 capped) through MAX_RETRANSMITS cycles of nsctp_handle_timeout + nsctp_poll_output, asserts CLOSED + flag on the threshold-trip iteration. * test_sctp_abort_does_not_set_failure_flag — scope guard ensuring peer ABORT does not regress into the failure flag. Verified across DC_RELIABLE=ON (test_sctp 56/56), DC_RELIABLE=OFF (test_sctp 55/55, Test 1 correctly skipped), and scripts/ci-check.sh --fast (15/15 incl. ASan + ICE_SRFLX=OFF + format). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-2 completed - docs/PLANS.md line 16: Phase 8 status now lists PR-2 (SCTP failure event propagation) as landed 2026-04-25, alongside the existing PR-3 / PR-4 / PR-5 entries; only PR-1 (IoT memory profile) remains pending in the recommended-order set. - docs/exec-plans/active/phase8-continued-optimization.md: - Tag PR-2 as [COMPLETED 2026-04-25] in the section header. - Replace the speculative "Approach" bullets with the landed implementation: closed_due_to_failure flag, RTC emit block placed outside the ESTABLISHED guard (the non-obvious detail), the two new regression tests, and verification record (test_sctp 56/56 with DC_RELIABLE=ON, 55/55 with DC_RELIABLE=OFF, ci-check.sh --fast 15/15). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ci): expand fuzz job to fuzz_h265 + fuzz_turn Both harnesses are already registered in tests/fuzz/CMakeLists.txt (fuzz_turn in FUZZ_CORE_TARGETS, fuzz_h265 under FUZZ_H265_TARGETS, gated by NANORTC_FEATURE_H265). The CI fuzz job already configures H265=ON, so adding the two harnesses to the run loop is the only remaining step. Closes review §4 P1-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(interop): add nanortc-as-TURN-client relay job Spins up the existing tests/interop/turn-server/docker-compose.yml coturn (via scripts/start-test-turn.sh) and runs ctest -L turn-relay against test_interop_turn_relay_nanortc.c (5 cases). Phase 5.2 F6-F10 regressions are now caught in CI; previously they only had hand verification on a downstream camera SDK. Updates docs/engineering/turn-rfc-compliance.md to reflect the new coverage status (the 'no automated test' note was stale once the 744-LOC harness landed). Closes review §4 P1-2 + P1-5 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rtc): defer DTLS_HANDSHAKING state until dtls_start succeeds rtc_begin_dtls_handshake() previously set rtc->state to DTLS_HANDSHAKING before calling dtls_start(). On dtls_start failure the function returned early but state was already advanced — leaving a half-initialised DTLS context that subsequent handle_input() bytes would drive down the DTLS path. Reorder so state is only advanced after dtls_start() succeeds; failure leaves state at ICE_CONNECTED so the caller can retry cleanly. Failure-path mock test deferred until a mock crypto provider lands (no current way to make dtls_init / dtls_start fail in tests). Closes review §4 P1-4 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rtc): tear down DTLS context on ice_restart Without explicit dtls_destroy() in nanortc_ice_restart(), the "if (!rtc->dtls.crypto_ctx)" guard in rtc_begin_dtls_handshake would skip dtls_init on the next handshake and reuse the previous DTLS context — possible key-material reuse / orphan handshake timers. dtls_destroy is a no-op when crypto_ctx is NULL, so calling it unconditionally is safe even before any DTLS init. Adds two regression tests: - test_api_ice_restart_clears_dtls: dtls_init -> ice_restart asserts crypto_ctx is NULL and dtls.state is CLOSED post-restart. - test_api_ice_restart_no_dtls_safe: ice_restart without prior dtls_init does not crash. Closes review §4 P2-3 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(crc): add RFC 1952 / 3309 / 3720 vector tests The CRC modules (nano_crc32.c / nano_crc32c.c) carried 100% line coverage indirectly via fuzz_stun (FINGERPRINT verify) and fuzz_sctp (chunk checksum), but had no direct algorithm-vector test. An accidental polynomial / initial-value / final-XOR change could pass indirect fuzz round-trip while still being algorithmically wrong. Adds 13 cases pinning canonical RFC vectors: - CRC-32 ISO HDLC (RFC 1952): empty, "123456789" (0xCBF43926), "a" (0xE8B7BE43), "abc" (0x352441C2) - CRC-32c Castagnoli (RFC 3309 / RFC 3720 §B.4): empty, "123456789" (0xE3069283), 32x0x00 (0x8A9136AA), 32x0xFF (0x62A8AB43), incrementing/decrementing 32-byte buffers - Incremental API: segmented == one-shot (the 3-segment pattern used by nsctp_verify_checksum), zero-length update tolerance, byte-at-a-time fragmentation Closes review §4 P3-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): gate test_crc32 on DATACHANNEL test_crc32.c calls nano_crc32c{,_init,_update,_final}, which live in src/nano_crc32c.c and are only compiled when NANORTC_FEATURE_DATACHANNEL is ON. Without gating, AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY builds and the architecture-constraints DC-OFF check fail to link. Add test_crc32 to the DC_TESTS list so the existing continue() guard skips it in DC-OFF configs, mirroring test_datachannel. The 13 CRC assertions still run in DATA / AUDIO / MEDIA / MEDIA_H265 — RFC vectors are configuration-invariant, so no coverage is lost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(esp-idf): add compile-only build workflow for examples Builds esp32_datachannel across esp32 / esp32s3 / esp32p4 and esp32_audio + esp32_video on esp32s3 via espressif/esp-idf-ci-action on push/PR to main. Catches component / Kconfig / lwIP regressions without requiring hardware; on-device Python E2E remains manual. esp32_camera is intentionally excluded — its board_manager autogen + camera-sensor selection is board-specific and will land separately. * fix(esp32_video): pin default flash size to 4MB in sdkconfig.defaults The custom partition table reserves a 0x3a0000 (~3.6 MB) factory app, ending at 0x3b0000. ESP-IDF's default flash size is 2 MB, so a clean build from sdkconfig.defaults fails partition-table generation with: Partitions tables occupies 3.7MB of flash (3866624 bytes) which does not fit in configured flash size 2MB. The locally-tracked sdkconfig was set to 16 MB, which masked this in manual builds; the new ESP-IDF CI workflow starts from defaults and caught the regression. Set CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y as the minimum size that fits the partitions; boards with larger flash override via menuconfig. Verified: gen_esp32part.py with --flash-size 4MB succeeds; with 2MB reproduces the CI error exactly. * ci(esp-idf): fetch submodules in checkout for esp32_video sample data esp32_video's main/CMakeLists.txt packs frames from examples/sample_data/h264SampleFrames and opusSampleFrames into video.blob / audio.blob at build time. examples/sample_data is a git submodule (0x1abin/media_sample_data); without `submodules: recursive`, actions/checkout@v4 leaves the path empty and the build fails: Error: examples/sample_data/h264SampleFrames is not a directory Error: examples/sample_data/opusSampleFrames is not a directory The other matrix targets don't reference sample_data, but enabling submodules at the workflow level keeps the step uniform; the submodule is ~15 MB so the cost is negligible. * fix(rtc): clear cached fingerprints on ice_restart dtls_destroy() drops the DTLS context but leaves rtc->sdp.local_fingerprint and rtc->dtls.local_fingerprint populated. rtc_cache_fingerprint() is a write-once cache (early-returns when sdp.local_fingerprint is non-empty), so the next create_offer / accept_offer after an ICE restart would re-advertise the *previous* cert's SDP fingerprint while a fresh dtls_init() generates a new cert and fingerprint. The peer then rejects DTLS handshake with a fingerprint mismatch. Memset both fingerprints to '\0' immediately after dtls_destroy() so rtc_cache_fingerprint() repopulates them from the next dtls_init's cert. Test: extend test_api_ice_restart_clears_dtls (T20) — pre-populate both fingerprint buffers before calling nanortc_ice_restart and assert they are cleared on return. Reported by Copilot review on PR #59. * docs(rtc): correct rtc_begin_dtls_handshake state-failure comment The previous comment claimed that keeping state at ICE_CONNECTED after a dtls_start() failure prevents "a half-initialised DTLS context being driven by inbound bytes". That is misleading: the inbound DTLS gate in handle_input is `state >= NANORTC_STATE_ICE_CONNECTED`, so DTLS records are still accepted on the retry path. The state ordering only governs the FSM; the actual receive-path policy is unchanged. Reword to describe what the reorder actually achieves (FSM stays at ICE_CONNECTED so callers can retry DTLS startup) and explicitly call out that the inbound gate is unchanged, so future readers don't infer a stronger guarantee than exists. Reported by Copilot review on PR #59. * test(rtc): add e2e regression tests for ice_restart hardening Brackets the unit-level T20/T21 coverage in tests/test_trickle_ice.c with a full ICE -> DTLS round trip on both sides. Existing unit tests probe nanortc_ice_restart() against a hand-built nanortc_t and assert which fields got reset; nothing today drives the end-to-end path of two peers reaching DTLS_CONNECTED, restarting, and re-handshaking with fresh keying material. - e2e_resync_ice_creds() helper: counterpart to e2e_setup_ice_creds() that re-syncs ufrag/pwd and re-arms remote_candidates after both peers have called nanortc_ice_restart(). ice_restart() preserves local_candidates (RFC 8445 §9.1.1.1) but blanks the peer side. - test_e2e_ice_restart_dtls_rehandshake: drives both peers to DTLS_CONNECTED, captures fingerprints, restarts, re-syncs, pumps to reconnect, asserts the new cert hash differs from pre-restart. Catches a regression of 8fea1e8 (without dtls_destroy in nanortc_ice_restart, the second handshake skips dtls_init via the !crypto_ctx guard and reuses the original cert). - test_e2e_ice_restart_sdp_fingerprint_refresh: drives create_offer twice across an ice_restart, asserts sdp.local_fingerprint re-syncs to the freshly generated dtls.local_fingerprint. Catches a regression of 9a75412 (without the fingerprint memset, rtc_cache_fingerprint early-returns on the stale cache and the SDP cache holds the old cert hash while DTLS holds the new one). Validation: each new test was confirmed to fail when its target commit is surgically reverted in the working tree, and to pass on baseline develop. ./scripts/ci-check.sh --fast 15/15 green; full host suite 22/22. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): record post-PR-5 ice_restart hardening as TD-022 The post-PR-5 follow-up commits surfaced by the Copilot review on PR #59 (d59a528 / 8fea1e8 / 9a75412 / af8f566) were not tracked anywhere — TD-021 closed the original ICE hardening pass and phase8 PR-5 was already marked completed. Add a consolidated TD-022 entry naming each commit, the bug it fixes, and the test that pins it (the unit T20/T21 sites plus the two new e2e tests). Add a "Post-merge hardening" subsection at the tail of phase8 PR-5 with a commit -> failing-test mapping table so future readers can connect the breadcrumb trail without crawling git log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(h265): wire receive-path demux + first H.265 video e2e PR-2 stage B closes the receive half of the H.265 wiring so two nanortc instances can roundtrip H.265 end-to-end: - Per-track depacketizer becomes a codec-selected union (track.video.depkt.{h264,h265}) in src/nano_media.h; the H.265 arm is conditional on NANORTC_FEATURE_H265 so OFF builds keep the prior sizeof(nanortc_track_t). - track_init() dispatches on m->codec to call h265_depkt_init() for H.265 video tracks, else falls back to h264_depkt_init(). - nano_rtc_media_handle_rtp_or_rtcp() RTP demux dispatches to h265_depkt_push() + h265_is_keyframe() when the track is H.265; the single NANORTC_EV_MEDIA_DATA emit site is shared because the event payload is codec-neutral. Tests: - test_h265_track_init_dispatches_to_h265_depkt drives the same FU three-fragment vector through track_init + the union accessor as the standalone module test, proving dispatch path == direct path. - test_e2e_h265_loopback is the project's first true two-instance video e2e: full SDP/ICE/DTLS, one IDR Annex-B AU sent, answerer surfaces byte-identical payload with is_keyframe asserted. Stage A (SDP plumbing + send path) shipped 2026-04-16 during the browser interop hardening session; phase3-5-h265.md now records the staged-landing narrative and checks all PR-2 acceptance criteria. PR-3 (libdatachannel interop + browser + doc finalization) pending. --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nanortc_ice_restartnow tears down the DTLS context, andrtc_begin_dtls_handshakeonly advances state afterdtls_startsucceeds (review §4 P1-4 / P2-3).fuzz_h265+fuzz_turn; new interop job runs Phase 5.2 TURN-relay regression suite against coturn.Test plan
interop-turn-relayjob green (5 ctest cases)./scripts/ci-check.sh --fastclean before merge🤖 Generated with Claude Code